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ABSTRACT 

Seven linker histone H1 variants are present 
in human somatic cells with distinct prevalence 
across cell types. Despite being key structural 
components of chromatin, it is not known whether 
the different variants have specific roles in the 
regulation of nuclear processes or are differentially 
distributed throughout the genome. Using 
variant-specific antibodies to H1 and hemagglutinin 
(HA)-tagged recombinant H1 variants expressed in 
breast cancer cells, we have investigated the 
distribution of six H1 variants in promoters and 
genome-wide. H1 is depleted at promoters 
depending on its transcriptional status and differs 
between variants. Notably, H1.2 is less abundant 
than other variants at the transcription start sites 
of inactive genes, and promoters enriched in H1.2 
are different from those enriched in other variants 
and tend to be repressed. Additionally, H1.2 is 
enriched at chromosomal domains characterized 
by low guanine-cytosine (GC) content and is 
associated with lamina-associated domains. 
Meanwhile, other variants are associated with 
higher GC content, CpG islands and gene-rich 
domains. For instance, H1.0 and H1X are enriched 
at gene-rich chromosomes, whereas H1.2 is 
depleted. In short, histone H1 is not uniformly 
distributed along the genome and there are 
differences between variants, H1.2 being the one 
showing the most specific pattern and strongest 
correlation with low gene expression. 



INTRODUCTION 

Eukaryotic DNA is packaged into chromatin through its 
association with histone proteins. The fundamental repeat 
unit of chromatin is the nucleosome, which consists of 
146 bp of DNA wrapped around an octamer of core 
histone proteins H2A, H2B, H3 and H4. Linker histone 
HI sits at the base of the nucleosome near the entry and 
exit sites and is involved in the folding and stabilization of 
the 30-nm chromatin fiber, allowing a higher degree of 
DNA compaction (1-4). Histone HI is a family of 
ly sine-rich proteins that consists of three domains: a 
short basic N-terminal tail, a highly conserved central 
globular domain and a long positively charged 
C-terminal tail. Like in core histones, these tails are 
posttranslationally modified, mainly by phosphorylation, 
but also by acetylation, methylation, ubiquitination and 
formylation (5-10). Due to its role in the formation of 
higher-order chromatin structures, HI has classically 
been seen as a structural component related to chromatin 
compaction and inaccessibility to transcription factors, 
RNA polymerase and chromatin remodeling enzymes 
(11,12). However, in recent years, the view that HI plays 
a more dynamic and gene-specific role in regulating gene 
expression is gaining strength. Knock-out or knock-down 
studies in several organisms have revealed that only a few 
genes change in expression on complete depletion of HI, 
some being up- and some downregulated (13-22). 

Unlike core histones, the HI histone family is more evo- 
lutionary diverse and many organisms have multiple HI 
variants or subtypes, making the study of these proteins 
more complex. In humans, the histone HI family includes 
11 different HI variants with 7 somatic subtypes (Hl.l to 
HI. 5, H1.0 and H1X), three testis- specific variants (Hit, 
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H1T2 and HILS1) and one oocyte-specific variant (Hloo). 
Among the somatic histone HI variants, Hl.l to HI. 5 are 
expressed in a replication-dependent manner, whereas H1.0 
and H1X are replication-independent. HI. 2 to HI. 5 and 
H1X are ubiquitously expressed, Hl.l is restricted to 
certain tissues, and H1.0 accumulates in terminally 
differentiated cells (23). 

It is still far from clear why there are so many HI variants 
and great efforts have been made recently to elucidate 
whether they play specific roles or have redundant 
functions. Single or double HI variant knock-out studies 
in mice did not identify any specific phenotype and this was 
attributed to the compensatory upregulation of other 
subtypes, favoring the view that there is redundancy 
between HI variants (18). Despite these observations, 
there is growing evidence supporting the view that 
histone HI variants do have specific functions. HI 
subtypes present cell type and tissue-specific expression 
patterns and their expression is regulated over the course 
of differentiation and development (24-31). Different HI 
subtypes have also been differentially related with cancer 
processes (32-35). Chromatin binding affinity and 
residence time vary between HI subtypes owing to differ- 
ences mainly in the C-t tail, but also in the N-t tail (36-44). 
Furthermore, HI subtypes are differently posttransla- 
tionally modified and these modifications modulate their 
interaction with different partners. This could explain some 
reported specific functions for certain HI variants (45-57). 
Finally, global gene expression analyses in various cell 
types reveal that histone HI variants control the expression 
of different subsets of genes, pointing to a specific role of 
HI variants in gene regulation (58,59). 

To fully understand the function of histone HI and its 
variants, several groups have explored the genomic 
distribution of HI in vivo. Initial biochemical and 
microscopy-based approaches suggested a nonuniform 
distribution of HI in the cell nucleus and found differences 
between variants (44,60,61). However, due to the lack of 
specific ChlP-grade antibodies for most of the HI 
variants, it has been challenging to identify the precise 
mapping of HI variants in the genome. Genome- wide 
studies with histone HI started with ChlP-chip experi- 
ments in MCF7 cells using an antibody for total HI (62) 
and continued using DamID technique for the unique 
Drosophila histone HI (63). Recently, some groups 
succeeded in obtaining the first genome maps for HI 
variants. The genome- wide distribution of human HI. 5 
in IMR90 fibroblasts reveals that there are zones of 
enrichment in genie and intergenic regions of 
differentiated human cells, but not in embryonic stem 
cells, associated with gene repression and chromatin com- 
paction (64). Furthermore, analysis of tagged Hlc and 
Hid variants in knock-in mouse embryonic stem cells 
(ESCs) by ChlP-seq shows depletion of these variants 
from guanine-cytosine (GC)- and gene-rich regions and 
active promoters, and positive and negative correlations 
with H3K9me3 and H3K4me3, respectively, as well as an 
overrepresentation in major satellites (65). Finally, using 
DamID technology, the genomic mapping of human Hl.l 
to HI. 5 variants was also achieved in IMR90 cells (66). 
While HI. 2 to HI. 5 showed, in general, similar 



distributions and were depleted from CpG-dense and 
regulatory regions, Hl.l showed a district profile, 
pointing to a specific role of this variant in chromatin 
function. 

In this study, we investigated the distribution of the 
different HI somatic variants in breast cancer cells by 
chromatin immunoprecipitation (ChIP) combined with 
quantitative polymerase chain reaction (qPCR), tiling 
promoter arrays and high-resolution sequencing. We 
combined the use of specific antibodies for some 
variants and hemagglutinin (HA)-tagged recombinant 
HI variants expressed in cell lines to study the 
genome-wide distribution of H1.0, HI. 2 to HI. 5 and 
also H1X, a more recently identified and distantly 
related HI variant. Hl.l was omitted from our analysis, 
it being the only somatic HI variant not present in many 
cell types, including the cells used here. We also compared 
HI distribution with the nucleosome distribution in our 
T47D human breast cancer cell lines, by H3 immunopre- 
cipitation. Our data support the view that all HI variants 
occur across the genome, but also uncover specific features 
for HI. 2, both at promoters and genome-wide. 
Interestingly, HI. 2 enrichment correlates the most 
closely with gene repression, structural domains of chro- 
matin such as lamina-associated domains (LADs) and 
regions of low GC content. Overall, the distribution of 
HI. 2 along chromosomes differs from that of other 
variants including H1.0 and H1X, the two variants most 
structurally distant within the somatic HI family. This 
work represents a comprehensive attempt to investigate 
for the first time the occurrence and relevance of the dif- 
ferent histone HI variants in the genome of human cancer 
cells, and provides valuable data to clarify our under- 
standing of the functionalities and heterogeneity of HI. 



MATERIALS AND METHODS 

Cell lines and culturing conditions 

Breast cancer T47D-MTVL cells (carrying one stably 
integrated copy of luciferase reporter gene driven by the 
MMTV promoter), or derivative cells stably expressing 
HA-tagged HI variants (Hl-HA), were grown at 37°C 
with 5% C0 2 in RPMI 1640 medium, supplemented 
with 10% FBS, 2mM L-glutamine, lOOU/ml penicillin 
and lOOug/ml streptomycin, as described previously 
(59). HeLa cell line was grown at 37°C with 5% C0 2 in 
Dulbecco's modified Eagle's medium GlutaMax medium 
containing 10% fetal bovine serum (FBS) and 1% peni- 
cillin/streptomycin. MCF7 cell line was grown at 37°C 
with 5% C0 2 in Minimum Essential Medium (MEM) 
medium containing 10% FBS, 1% penicillin/strepto- 
mycin, 1% nonessential amino acids, 1% sodium 
pyruvate and 1 % glut amine. 

For Phorbol myristate acetate (PMA) experiments, 
serum-containing Roswell Park Memorial Institute 
(RPMI) 1640 media was replaced by serum-free media. 
After 24 h under serum-free conditions, cells were treated 
with PMA (100 nM) for the indicated time at 37°C. 
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Stable expression of HA-Tagged HI variants 

Generation of T47D-MTVL stably expressing HA-Tagged 
HI variants was achieved as described previously (59). 
Briefly, human histone HI variants were PCR-amplified 
from genomic DNA and cloned into pCDNA4-HA vector 
provided by D. Reinberg's group (NYU Medical School). 
The complete HI -HA cassette was cloned into the 
lentiviral expression vector pEV833.GFP provided by E. 
Verdin (Gladstone Institute) upstream an internal 
ribosome entry site (IRES)-GFP cassette. Viruses were 
then produced and cells were infected with pEV833- 
derived lenti virus. HA- tagged HI variants-expressing cell 
lines were selected by sorting in a FACSvantageSE or 
FACS caliber machine (Becton Dickinson) for green 
fluorescent protein (GFP)-positive fluorescence. 

ChIP assays 

ChIP assays were performed as described previously (67). 
Briefly, cells were fixed using 1 % formaldehyde, harvested 
and sonicated using a Diagenode Bioruptor to generate 
chromatin fragments between 200 and 500 bp. To 
perform the ChIP, 30 ug of chromatin was immunopre- 
cipitated overnight using the indicated antibody. Rabbit 
IgG (Santa Cruz Biothechnology) was used as a control 
for nonspecific interaction of DNA. Input was prepared 
with 10% of the chromatin material used for an 
immunoprecipitation. Immunocomplexes were recovered 
using 20 ul of Protein-A magnetic beads from Millipore. 
Beads with bound antibody /protein/DNA complexes were 
washed, decross-linked at 65°C overnight and 
immunoprecipitated DNA was recovered using the IPure 
Kit from Diagenode. 

The following antibodies were used in this study: 
anti-H1.2 (Abeam 4086), anti-HlX (Abeam 31972), 
anit-H3 (Abeam 1791) and anti-HA tag (Abeam 9110). 

ChlP-qPCR 

Real-time PCR was performed on ChIP and input DNA 
using EXPRESS SYBR GreenER qPCR SuperMix 
Universal (Invitrogen) and specific oligonucleotides in a 
Roche 480 Lightcycler. ChIP values were corrected by 
the correspondent input chromatin sample. All 
oligonucleotide sequences used for the amplifications are 
available on request. 

ChlP-chip assays with Nimblegen promoter array 

At least lOng of ChIP and input DNA was amplified 
using GenomePlex Complete Whole Genome 
Amplification Kit (Sigma) and eluted with GenElute 
PCR Clean-Up Kit (Sigma). For ChlP-on-chip experi- 
ments we used Nimblgen HG18 Refseq Promoter 
3x720K array. One microgram of ChIP and input DNA 
was directly labeled by Klenow random priming with Cy5 
and Cy3 nonamers with Nimblegen Dual-color DNA 
Labeling Kit following manufacturer's user's guide 
Chip-chip arrays v6.2, and the labeled DNA was 
precipitated with 1 volume isopropanol. Hybridization 
mix including 1 5 ug of labeled DNA was prepared using 
Nimblegen Hybridization Kit. Arrays were hybridized in 



Nimblegen Hybridization System 4 Station for 16-18 h at 
42°C, and then washed in lx Wash solution I, II and III. 
Hybridization buffers and washes were completed using 
manufacturer's protocols. Arrays were scanned on a 
Nimblegen MS 200 Scanner per manufacturer's protocol. 

ChlP-on-chip raw data was normalized and differential 
intensity of each probe compared with input control was 
calculated using the Nimblegen software DEVA. Average 
fold change (ChIP versus input) each 50 bp bin for a range 
of — 3.2kb upstream and 800 bp downstream window 
from RefSeq transcription start sites (TSS) were calculated 
using in-house Perl script. LOESS smoothed line plot 
around the TSS were plotted using in-house script 
written in R statistical programming language. For 
ChlP-signal heat map, similarly fold change average for 
each individual RefSeq transcript was calculated and then 
data were visualized with Java Tree view (68). Functional 
annotation of target genes based on Gene Ontology was 
performed using DAVID Software (Database for 
Annotation, Visualization and Integrated Discovery). 

ChlP-seq 

Library preparation for sequencing: ChIP and genomic 
library preparation was performed using standard 
Illumina protocols. Libraries were prepared with the 
ChlP-seq Sample Preparation Kit (Illumina) according 
to the manufacturer's instructions. Briefly, lOng of ChIP 
and input DNA were repaired to overhang a 3'-dA and 
then adapters were ligated to the end of DNA fragments. 
DNA fragments with proper size (usually 100-300 bp, 
including adaptor sequence) were selected after PCR 
amplification, obtaining qualified library for sequencing. 

Sequencing, mapping and peak detection: Sequencing 
was performed with Illumina HiSeq 2000 system. Raw 
sequence reads containing >10% of 'N', or bases with 
Q<20 account for >50% of the total were removed and 
adaptor sequences were trimmed. Identified clean reads 
were uniquely aligned allowing at best two mismatches 
to the UCSC (The Genome Sequencing Consortium) 
reference genome (human hgl8) using the program 
SOAP (version 2.21) (69). Sequences matching exactly 
more than one place with equal quality were discarded 
to avoid bias. Read length and read counts of each 
library are listed in Supplementary Table SI. Peak caller 
program for histone, SICER (version 1.1) (70), was used 
with following parameters: redundancy threshold = 1, 
window size = 200, fragment size = 150, effective 
genome fraction = 0.75, gap size = 200, false discovery 
rate (FDR) = 0.01 and Fold Change at least 2. Input sub- 
tracted normalized (total mapped library size) WIG files 
were produced from duplicate removed aligned reads 
using the program javaGenomicsToolKit. 

Binding sites to gene feature annotation: Enriched 
peaks were annotated to nearest gene (RefSeq genes) 
using Bioconductor package ChlPpeakAnno (71). 
Distribution of enriched and depleted regions (peaks) to 
various genomic features, and continuous ChIP signal 
profile distribution of reads along the meta-gene were 
performed using software CEAS (72) and in-house 
Python and Perl scripts. 
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Regulatory regions, histone modification peaks, CpG 
and LADs abundance: Input-subtracted normalized 
average HI variants read density in each enriched loca- 
tions of regulatory regions, histone modification peaks, 
CpG and LADs were calculated, and representation in 
box-plot were made using in-house scripts. As a control, 
a random sample of genomic windows with equal width 
was used to perform the significance test (Kolmogorov- 
Smirnov test). 

Publicly available genome- wide location data analysis: 
Public ChlP-seq data, which includes H3K4mel, 
H3K4me2, H3K4me3, H3K27me3, H3K27ac, H3K9me3, 
H3K9ac, H3K36me3, P300, CTCF, FAIRE and DNase 
enriched genomic locations, are taken from ENCODE 
project. CpG island genomic location information (hgl8) 
and the coordinates of LADs (73) were taken from UCSC 
database. Publicly available whole-genome data if not 
available on hgl8 version, they were first remapped to 
the human genome version hgl8 using the UCSC coordin- 
ate conversion tool (http://genome.ucsc.edu/cgi-bin/ 
hgLiftOver). 

Overlap analysis: Overlap of genomic position range 
data was done using BedTools (74). Overlap means two 
genomic range data overlap by at least one base. 

Average ChIP signal profile: For sequencing data, ChIP 
signal around center of each given genomic location were 
calculated by using normalized input subtracted-average 
tags number in each 50 bp bins in a set window. Relative 
distance of each tag from above-mentioned position and 
average signal was determined by using 'Sitepro' script of 
CEAS package (72) and plotting was done in R 
programming language. 

Occupancy of HI variants at individual chromosomes: 
Occupancy of HI variants at all human chromosomes is 
an average of the input-subtracted ChlP-seq signal in 
50 bp windows. Heat map and dendrogram were done 
with in-house R scripts. Correlation between the 
occupancy of HI variants (input-subtracted ChlP-seq 
signal average of 50 bp genomic windows) and gene 
expression and gene richness coefficient was done with 
in-house R scripts. Gene expression for each chromosome 
was computed as the average of the expression of all the 
available expressed genes. The gene-richness coefficient 
(GRC) for each chromosome was calculated as the ratio 
between the percentage of total genes present in each 
chromosome and the percentage of base pairs of each 
chromosome to the total human genome. 

Agilent expression arrays 

Total RNA was extracted using High Pure RNA isolation 
Kit (Roche) according to the manufacturer's instructions. 
cDNA was obtained from lOOng of total RNA using 
Superscript VILO cDNA synthesis Kit (Invitrogen). 
High RNA integrity was assessed by Bioanalyzer nano 
6000 assay. For each sample, 100 ng of total were 
reverse transcribed into cDNA with a T7 promoter and 
the cDNA was in vitro transcribed into cRNA in the 
presence of Cy3-CTP using the Low input quick Amp 
kit (Agilent). Labeled samples were purified using 
RNeasy mini spin columns (Qiagen). Then, 600 ng of 



cRNA were preblocked and fragmented in Agilent 
fragmentation buffer and mixed with Agilent GEx 
Hybridization mix. Hybridization mix was laid onto 
each sector of subarray gasket slide and sandwiched 
against an 8 x 65K format oligonucleotide microarray 
(Human vl Sureprint G3 Human GE 8x60k Microarray, 
Agilent design ID 028004) inside a hybridization chamber, 
which was hybridized overnight at 65°C. Subsequently 
array chambers were disassembled submerged in Agilent 
Gene Expression Buffer 1 and washed 1 min in another 
dish with the same solution with a magnetic stirrer at 
200 rpm at room temperature, followed by 1 min in 
Agilent Gene Expression Buffer 2 with a magnetic stirrer 
at 200 rpm at 37°C and immediate withdrawal from the 
solution and air drying. Fluorescent signal was captured 
into TIF images with an Agilent scanner using recom- 
mended settings with Scan Control software (Agilent). 
Signal intensities were extracted into a tabulated text file 
using Feature Extraction software (Agilent) using the ap- 
propriate array configuration and annotation files. The 
normalized log2 intensities were obtained using quantile 
method with normalized expression background correc- 
tion the Bioconductor Limma package in R. 

Human HI variants nomenclature 

The correspondence of the human HI variants nomencla- 
ture with its gene names is as follows: H1.0, HIF0; Hl.l, 
HIST1H1A; H1.2, HIST1H1C; H1.3, HIST1H1D; H1.4, 
HIST1H1E; HI. 5, HIST1H1B; H1X, HIFX. 



RESULTS 

All HI variants are nonspecifically present at gene 
promoters and are depleted from TSS in active genes or 
on induced gene activation 

To determine whether the genomic distribution of human 
histone HI differs between variants, we used ChIP 
combined with semiquantitative PCR (ChlP-qPCR), 
promoter array hybridization (ChlP-on-chip) and 
massive sequencing (ChlP-seq). Because there is a 
limited number of HI -variant-specific ChlP-grade 
antibodies (only HI. 2 and H1X in our hands), we de- 
veloped T47D-derived cell lines stably expressing HA- 
tagged versions of each of the five somatic HI variants 
expressed in most cell types (H1.0, HI. 2, HI. 3, HI. 4 and 
HI. 5) (see 'Materials and Methods' section) (59). These 
cell lines proliferated similarly to parental cells (data not 
shown). HA- tagged HI variants (HI -HA) were expressed 
at levels lower than or similar to their corresponding en- 
dogenous histone, comparably across the different HI 
variant-expressing cell lines, and they were incorporated 
into chromatin (Supplementary Figure SI). In ChlP- 
qPCR experiments, an anti-HA antibody was used to spe- 
cifically pull down HI -associated chromatin fragments in 
cells expressing Hl-HAs (Supplementary Figure S2). Hl- 
associated chromatin included gene promoters, coding 
regions and repetitive DNA, irrespective of which Hl- 
HA variant was immunoprecipitated (Supplementary 
Figure S3). A few differences were observed between 
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Figure 1. All HI variants are present at gene promoters and depleted 
from TSS. (A and C) ChIP experiments were performed in T47D- 
derived cells stably expressing HA-tagged HI variants, wild-type or a 
K26A mutant of HI. 4 with anti-HA antibody and the abundance of 
IPed material was quantified by qPCR with oligonucleotides for the 
indicated promoters (— lOkb distal promoter or TSS), and corrected by 
input DNA amplification with the same primer pair. (B) ChIP experi- 
ments were performed in parental T47D cells with HI variant-specific 
antibodies against HI. 2 and H1X and the IPed material was quantified 
as in (A). (D) An HI valley was performed at TSS of the JUN gene and 
increased on mitogenic stimulation. T47D cells were treated with PMA 
lOOnM for 60min or left untreated and ChIP was performed with HI. 2 
and H1X antibodies. The abundance of IPed material was quantified 
by qPCR with oligonucleotides for the JUN promoter (— lOkb distal 
promoter or TSS), and corrected by input DNA amplification. 
Representative experiments performed in triplicate are shown. 



variants, e.g. there were relatively less HI. 3 but more HI. 4 
and HI. 5 at alphoid repeats. 

The specificity of HI variant distribution was 
investigated in more detail at gene promoters previously 
shown to contain HI in distal regions located lOkb 
upstream of their TSS and depletion of HI at the TSS 
(HI valley) (62). All the HI variants were detected at all 
distal promoter regions tested, in similar proportions, and 
a similar degree of HI depletion was observed at the TSS 
of all genes for all the HI variants, including an HI. 4 
mutant (K26A) at a residue targeted by acetyl and 
methyl transferases and reported to be involved in recruit- 
ing chromatin proteins (Figure 1A) (5,6,46,75). Moreover, 
local depletion of HI at TSS was also observed by 
immunoprecipitating endogenous histones with specific 
HI. 2 and H1X antibodies (Figure IB). The ChIP specifi- 
city of these antibodies was confirmed in HI. 2 and H1X 
inducible knock-down cells (Supplementary Figure S4). 
Interestingly, the TSS-associated HI valley was not 
observed at genes inactive in these cells, i.e. OCT4 and 
NANOG (Figure 1C), while the HI valley was evident 
at genes being expressed, as indicated by mRNA accumu- 
lation measured by RT-qPCR. Moreover, the HI valley 
correlated with H3K4me3 enrichment at the TSS 
compared with a 10-kb upstream region, an open 
chromatin state at TSS measured by formaldehyde- 
assisted isolation of regulatory elements (FAIRE)-qPCR 
and nucleosome depletion (H3 ChIP) (Supplementary 
Figure S5). Furthermore, under stimulating conditions 
HI depletion at the TSS was increased at inducible 
promoters, such as steroid hormone responsive promoters 
(MMTV) or genes induced by mitogenic agents (JUN and 
FOS) (Supplementary Figures S6 and S7). Noteworthily, 
in these early response genes, there was already an HI 
valley in noninducing conditions and this became deeper 
on stimulation (Figure ID and Supplementary Figure S7). 

Extended depletion of HI at promoters is dependent on 
the transcriptional status of the gene and shows 
differences between variants 

To explore the genome-wide distribution of the different 
HI variants across gene promoters, we hybridized ChIP 
material obtained with variant-specific antibodies or cor- 
responding to HA-tagged HI variant-associated chroma- 
tin with a promoter tiling array containing probes for 
30 893 transcripts (-3200 to +800 bp to the TSS) arising 
from 22 542 human promoters. The average log2 ratio of 
probe intensity for all transcripts was plotted against the 
relative distance to the TSS for each variant and an HI 
valley close to the TSS was apparent in all cases. 
Interestingly, in the two HI. 2 samples (endogenous HI. 2 
and H1.2-HA), the valley was more pronounced and 
slightly shifted toward the TSS, compared with that for 
the other HI variants (endogenous H1X and HI. 0/3/4/5- 
HA) (Figure 2A). 

Subsequently, this ChlP-chip data was combined with 
gene expression data for ca. 20 000 of the transcripts, 
obtained with the parental cell line in a human expression 
array (Agilent) (Supplementary Figure S8), and heat maps 
representing binding intensity were constructed for each 
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Figure 2. The extension of HI depletion at promoters is transcription status-dependent and variant-specific. (A) Average log2 enrichment ratio of 
ChlP-chip probe intensity for all transcripts was represented regarding the relative distance to TSS for each variant. (B) Heat maps of ChlP-chip 
probe intensity around TSS (—3200 to +800 bp) for 20 338 transcripts from which the expression rate was determined. Genes are ordered from 
highest to lowest gene expression. (C) Average log2 ratio of ChlP-chip probe intensity represented regarding the relative distance to TSS for all 
transcripts classified according to expression in 10 groups containing a same number of transcripts, from highest (EG1) to lowest (EG 10) expression. 
Representative ChlP-chip experiments are shown. 
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variant, ranking promoters from highest to lowest gene 
expression (Figure 2B). An HI valley was clearly seen 
for at least the top 50-60% most highly expressed tran- 
scripts in all variants. Notably, the valley extended toward 
the least expressed genes in HI. 2 samples. Then all the 
transcripts considered were divided into 10 groups from 
high to low expression, and average log2 ratio of ChlP- 
chip probe intensity was plotted against the relative 
distance to the TSS for each expression group and each 
variant (Figure 2C). These graphs confirmed that HI de- 
pletion at promoters is dependent on the transcriptional 
status of the gene. The HI valley around TSS was deeper 
and wider for HI. 2 than for the other variants, irrespective 
of whether endogenous or HA-tagged histone was 
measured. In general, HI depletion extended to some 
degree at least 1 kb upstream of the TSS of active genes, 
further than the predicted extent of the reported nucleo- 
some-free region (NFR) that lies upstream of the TSS. To 
confirm this result, ChlP-chip for the core histone H3 was 
also performed and showed that H3 was depleted at active 
genes and more locally than HI (Figure 2B and C). H3 
and all His except HI. 2 presented a marked enrichment 
peak immediately downstream of the TSS, which may cor- 
respond to a positioned nucleosome as previously 
reported (76,77). ChlP-qPCR on selected promoters con- 
firmed some of these observations, namely, in some re- 
pressed promoters there was high H1.0 but low HI. 2 
content around the TSS (Supplementary Figure S9). 

In addition to protein-coding genes, the promoter array 
contained 1145 noncoding transcripts, including structural 
RNAs and transcribed pseudogenes, that overall pre- 
sented a low expression rate compared with the 
complete transcriptome. An HI valley at the TSS was 
only apparent on the ChlP-chip heat maps for endogen- 
ous and HA-tagged HI. 2, in agreement with our observa- 
tion that an HI. 2 valley occurs even at weakly expressed 
promoters (Supplementary Figure S10). 

H1.2 abundance at distal promoters is a mark of 
transcriptional inactivity and negatively correlated with 
the presence of other HI variants 

Noteworthily, HI. 2 abundance at distal promoter regions 
(—3200 to —2000 bp from TSS) was inversely proportional 
to gene expression, being more abundant at repressed pro- 
moters (Figure 2C). This was also observed to some extent 
for the other HI variants and H3 with the exception of the 
ca. 10% most and least strongly expressed genes that 
showed the opposite trend. In agreement with this, when 
gene promoters were ranked from weakest to strongest HI 
enrichment at the distal promoter region, a negative cor- 
relation with gene expression was seen especially for HI. 2 
(Figure 3 A). Genes with the highest distal promoter HI. 2 
content (top 10%) mainly fell among those with the lowest 
expression, whereas genes with the lowest HI. 2 content 
(bottom 10%) fell among those with the highest expres- 
sion (Figure 3 A, right panel). This was partially true also 
for H1X but less evident for the Hl-HAs. Gene ontology 
analysis of HI variant-enriched (top 10%) or -deprived 
(bottom 10%) promoters revealed that different biological 
processes were regulated by the different variants in T47D 



cells. For example, genes with the lowest content of H1X 
at promoters included active genes involved in chromatin 
organization, and those with the lowest HI. 2 content in 
these regions included genes involved with cell-cell signal- 
ing or regionalization. On the other hand, genes with the 
highest H1X and HI -2 content at promoters included 
those involved in pattern formation and repressed 
genes involved in sensory perception, respectively 
(Supplementary Table S2). 

Moreover, HI. 2 abundance at distal promoter regions 
was inversely correlated with H3, H1X and HI -HA 
abundance, while H1.2-HA showed an intermediate 
pattern (Figure 3B and Supplementary Figure Sll). 
This indicates that there is a preferential binding of HI. 2 
in some promoters (mostly repressed genes) compared 
with the other variants, and vice versa, many promoters 
are devoid of HI. 2 but contain other HI variants. 

Venn diagrams were drawn for the top 10% genes with 
high or low HI. 2 and high or low H1X at the distal 
promoter to identify genes presenting high2/lowX and 
vice versa (Supplementary Figure SI 2). The largest 
overlaps were between low2/highX promoters 
(553 genes), mainly corresponding to expressed genes 
(Figure 3C). Representative genes of the two groups were 
randomly selected (TMEM204 and TUBGCP5 for low2/ 
highX, and COL4A3 and CUGBP2 for high2/lowX-con- 
taining promoters) and used to confirm by ChlP-qPCR 
that some promoters preferentially bind with particular 
variants (Figure 3D). Similarly, Venn diagram compari- 
sons of the top 10% genes with high or low HI. 2 versus 
high or low HI .0-HA showed that the largest overlaps were 
Iow2/high0 with 716, and high2/low0 with 276 genes 
(Supplementary Figure SI 3). Taken together, our data 
indicated that promoters having few HI. 2 variants are 
loaded with large amounts of other variants, not only 
with exogenously expressed HI .0-HA but also endogenous 
H1X. Expression analysis of such groups of genes found 
that genes with few HI variants at distal promoters are 
highly expressed, and vice versa, but also that HI. 2 
content is the strongest predictor of gene expression 
(Figures 3C and Supplementary Figures S12C and S13C). 

The universality of the relative H1.2/H1X abundance at 
representative genes was tested in two additional cell lines 
by ChlP-qPCR (Figure 3E and Supplementary Figure 
S14). HeLa cells showed results similar to T47D, i.e. 
H1.2/H1X ratios were higher in COL4A3 and CUGBP2 
genes than TMEM204 and TUBGCP5, although ratios in 
all genes were higher than in T47D reflecting a higher 
relative abundance of HI. 2 in HeLa cells (Supplementary 
Figure S14). On the other hand, HI .2/H1X ratios in MCF7 
were similar in all four genes, due to higher H1X signals in 
COL4A3 and CUGBP2 genes. This result indicated that 
relative abundances between variants at promoters were 
not fully conserved between cell types, although the 
patterns in T47D and HeLa were similar. In relation to 
this, ChlP-chip of HI. 2 and H1X in HeLa confirmed that 
these two variants do not coexist at exactly the same distal 
promoters (Supplementary Figure SI 5). 

Next, we plotted heat maps of HI. 2 abundance at the 
promoters of genes ranked according to their position 
along several human chromosomes (Figure 4A). 
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Figure 3. HI. 2 abundance at distal promoter regions negatively correlates with gene expression and abundance of other variants. (A) Heat maps of 
gene expression data for 20 338 transcripts ordered from lowest to highest HI content at distal promoter regions (—3200 to —2000 bp relative to 
TSS), for each of the HI variants indicated. (Right panel) Expression levels of genes presenting the highest or lowest HI variant content at distal 
promoter is shown as a box plot. Significance was tested using the Kolmogorov-Smirnov test. Enrichment and depletion is marked with red and blue 
asterisks, respectively. *P< 0.001. (B) Heat maps of HI ChlP-chip probe intensity around TSS (—3200 to +800 bp) for 20 338 transcripts from which 
the expression rate was determined. Genes are ordered from lowest to highest HI. 2 content at distal promoter regions. Genes with the top or lowest 
distal HI content are indicated. These genes (2050 genes for each group, 10% of the total) were used to determine the number of coinciding genes as 
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Interestingly, several domains of high HI. 2 abundance 
were detected along these chromosomes, correlating with 
clusters of differential gene expression. Notably, chromo- 
some 19, the most gene-rich chromosome, showed overall 
high gene expression and low HI .2 content at promoters, as 
did chromosome 17. On the other hand, the least gene-rich 
chromosome, chromosome 13, presented low gene expres- 
sion and high HI. 2 content (Figure 4A and Supplementary 
Figure SI 6). The observed clustered distribution was well 
conserved between cell lines, but differed between HI 
variants. H1X and H1.0-HA abundances were not clus- 
tered with the same pattern as gene expression. Rather, 
these variants were abundant at promoters located on 
gene-rich chromosomes 17 and 19, and depleted on the 
gene-poor chromosome 13 (Supplementary Figure SI 6). 
In summary, HI. 2 content at promoters is the best HI 
reporter of gene expression. 

HI variants are differentially depleted from regulatory 
regions and enriched at CpG sites 

To further explore whether the genomic distribution of HI 
variants is heterogeneous, we combined ChIP of endogen- 
ous H1.2, H1X, H3 and HA-tagged H1.0, H1.2 and H1.4 
with high-resolution sequencing (ChlP-seq) of up to 50 
million reads per sample (Supplementary Table SI). To 
confirm the results obtained by ChlP-chip, we focused 
first on the input-subtracted normalized average ChIP 
signal obtained around coding regions of genes grouped 
according to basal expression as before (Figure 5A). 
Again, the HI valley at the TSS depended on expression 
rates and differences were seen between HI variants, 
mainly the abundance of HI. 2 at the TSS of nonexpressed 
genes being lower than that of the other subtypes, which 
showed high levels toward nucleosome +1. Transcription 
termination sites (TTS) also showed differences between 
variants, being depleted of HI subtypes except for HI. 2. 
Interestingly, the HI content of gene bodies increased 
toward the end and also depended on gene expression 
rates. While H3 levels were uniform, those of HI 
variants such as HI. 2 were lower at the 5' moiety of 
highly active genes (Figure 5A). 

In addition to the local displacement of HI from active 
promoters, HI variants were markedly depleted from 
other regulatory regions along the genome, namely, CC 
CTC-binding factor (CTCF) binding sites corresponding 
to insulators, and p300 binding sites associated with 
enhancers, but little affected at DNase hypersensitivity 
sites and FAIRE-identified regions representing open 
chromatin (Figure 5B and Supplementary Figure SI 7). 
When we calculated the input-subtracted coverage of HI 
variants across the peaks of selected core histone modifi- 
cations, depletion of H 1.0 and HI. 2, and to some extent of 



HI. 4 but not H1X, was associated with positive histone 
marks linked to strong enhancers such as H3K4mel, 
H3K4me2 and H3K27ac (Supplementary Figure SI 7). 
HI abundance at H3K4me3 and H3K9ac sites, enriched 
at TSS of active promoters, differed between variants, re- 
flecting HI. 2 depletion at the TSS of most genes but local 
enrichment of the other variants immediately after the 
TSS. No strong enrichment of HI was found at negative 
histone marks such as H3K9me3 or H3K27me3. It is also 
worth noting that HI. 2 abundance was lower at active 
marks than at those related with repression and chromatin 
compaction, in agreement with the observed correlation 
between HI. 2 content and gene repression. 

Next, we investigated whether the location of HI 
variants coincided with CpG regions across the genome. 
As seen in Figure 5C, H1.0, H1X and HI. 4 were clearly 
overrepresented in CpG regions compared with HI. 2. 
Because CpG are mostly localized at gene promoters, 
this finding may reflect the overall higher abundance of 
those variants compared with HI. 2 around TSS, consider- 
ing the weakly expressed genes. Alternatively, it is not 
possible to rule out a certain relationship between H1.0 
(and other variants apart from HI. 2) and CpG or DNA 
methylation. 

Differential prevalence of HI variants along the genome 

To further correlate ChlP-chip data of HI abundance at 
promoters with ChlP-seq signals, regions of clustered 
genes with high HI. 2 content such as the ones marked 
with asterisks in Figure 4A (chromosomes 1 and 12) were 
explored for input-subtracted HI variant content using the 
UCSC genome browser (Figure 4B). The whole domain was 
enriched in HI. 2 ChlP-seq signal compared with neighbor- 
ing regions, indicating that HI. 2 enrichment was not limited 
to the promoters of repressed genes therein. Interestingly, 
this domain was characterized by low GC content and the 
presence of LADs reported to anchor chromatin segments to 
the nuclear periphery (73). LADs are typified by low gene- 
expression levels, representing a repressive chromatin envir- 
onment. Notably, the distribution of the other variants 
analyzed by ChlP-seq was not as clearly delimited to this 
domain as HI. 2. Further examination of HI variant signals 
across several regions containing LADs using the UCSC 
genome browser showed that HI. 2 was the variant most 
strongly correlated with LAD positions and had fairly 
well delimited borders of enrichment (Supplementary 
Figure SI 8). When the input-subtracted coverage of HI 
variants across LADs was calculated, HI. 2 was the only 
variant showing enrichment (Figure 4C). 

We then examined individual chromosomes for the 
presence of the input-subtracted signal of the different HI 
variants. Abundance of HI was heterogeneous along 



Figure 3. Continued 

shown in Supplementary Figures S12 and SI 3. (C) Expression levels of coinciding genes in the comparisons between genes presenting the highest or 
lowest HI. 2 or H1X (h2/12/hX/lX, respectively) content at distal promoter is shown as a box plot. The number of common genes for each 
comparison is indicated. Significance was tested using the Kolmogorov-Smirnov test. **p< 0.001 and *P< 0.005. (D) ChlP-qPCR confirmed 
that some genes are enriched in HI. 2 or H1X at distal promoter. TMEM204 and TUBGCP5 genes were randomly chosen among the group of 
genes presenting low HI. 2 and high H1X (553 genes), and COL4A3 and CUGBP2 genes among the genes presenting high HI. 2 and low H1X (189 
genes) (see Supplementary Figure S12). After ChlP-qPCR of HI. 2 and H1X abundance at distal promoter regions of these genes in T47D cells, the 
relative ratio H1.2/H1X was calculated. (E) The differential ratio between HI. 2 and H1X abundance at selected genes observed in T47D cells is 
conserved in HeLa cells but not in MCF7. Representative ChlP-qPCR experiments performed in triplicate are shown. 
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Figure 4. HI variant content at gene promoters along human chromosomes and relation of HI variants with LADs and GC content. (A) Heat maps 
of HI. 2 ChlP-chip probe intensity around TSS (—3200 to +800 bp) for genes ordered according to their position along several human chromosomes. 
Gene expression levels for each gene in T47D cells is represented in the left in two different ways (as a heat map and graphical representation of log 2 
ratios). A GRC for each chromosome, calculated as the ratio between the percentage of genes present in each chromosome and the percentage of 
base pairs of each chromosome to the total human genome, is indicated. The centromere location is marked with a triangle. Regions of interest are 
marked with an asterisk and viewed in the UCSC genome browser in (B). (B) Distribution of HI variants along selected regions of chromosome 1 
and 12. Input-subtracted HI. 2 and H1X ChlP-seq signal viewed in the UCSC genome browser together with GC content, RefSeq genes, H3K4me3 
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chromosomes, showing extensive patches of enrichment or 
depletion of H 1 compared with the input (Supplementary 
Figure S19). Interestingly, the HI. 2 pattern was the most 
different from the other variants, endogenous H1X and 
HA-tagged variants showing patterns that were similar to 
each other and to that of H3, while the pattern for HA- 
tagged HI. 2 was more similar to endogenous HI. 2 than to 
other HA-tagged His. It is worth noting that long genome 
patches of low GC content were found to be devoid of all 
HI variants except HI. 2, which was enriched. We next per- 
formed genome- wide correlation analysis between the 
input-subtracted HI variant signal and GC content. Low 
CG content was associated with high occupancy of HI. 2 
but low occupancy of the other variants, including H1X, 
and vice versa (Figure 4D and Supplementary Figure S20). 

Further comparison of the overall abundance of HI 
variants at each individual chromosome revealed unique 
patterns creating corresponding clusters of chromosomes 
and HI variants (Figure 6). Interestingly, chromosomes 
were clustered in a manner that was related to their 
gene-richness and the overall expression of genes they con- 
tained. Gene-rich chromosomes showed H1.0 and H1X 
enrichment, and HI. 2 depletion, whereas the opposite 
was found at gene-poor chromosomes, in agreement 
with the promoter ChlP-chip data described above. 
Correlation analysis confirmed these conclusions 
(Supplementary Figure S21). Notably, HI variants were 
clustered differently depending on whether they were rep- 
lication-independent (H1.0 and H1X) or synthesized over 
the course of DNA replication only (HI. 2, HI. 4 and the 
core H3 histone). Further, H1.0 and H1X had a more 
heterogeneous distribution between chromosomes (data 
not shown). 

Genomic annotation of enriched or depleted regions of 
individual HI variants shows that HI. 2 is associated with 
intergenic regions and repressed genes 

Next, we searched specific regions of the genome either 
enriched or depleted for each HI variant signal over 
input DNA with a fold change >2 using SICER 
software (Supplementary Table S3). Most Hl-enriched 
regions were inside genes (arbitrarily defined as from 
— 5kb upstream to +3kb downstream of the TTS), 
whereas HI. 2 peaks were more abundant at intergenic 
regions (Supplementary Table S3 and Supplementary 
Figure S22). On the other hand, all HI -depleted regions 
were more abundant inside genes, especially for HI. 2. 
Within genes, H1.2-enriched regions were disfavored at 
promoters (— 5kb to +1 kb flanking TSS) compared with 
other HI peaks, whereas H1.2-depleted regions were 
strongly favored, in agreement with ChlP-chip data pre- 
sented in Figure 2. In agreement with our aforementioned 
data, the GC content in H1.2-enriched regions was lower 



than in the other variants (Supplementary Figure S20). 
Next, we analyzed the overlap between Hl-enriched and 
depleted regions with CpG islands. CpG islands were 
enriched at H1.2-depleted regions and at regions 
enriched for the other variants, confirming the inverse cor- 
relation between CpG islands and HI. 2 described above 
(Supplementary Figures S22 and S23). As expected, 
regions overlapping with CpG sites were preferentially 
located at promoters. For example, 42% of H1.0- or 
HlX-enriched regions located at promoters overlapped 
with a CpG island, while this was the case for only 
4-8% of regions enriched in these variants located at 
intergenic regions. 

To identify HI variant target genes we looked for genes 
that had at least one Hl-enriched region from — 5kb to 
+3 kb from the TTS. HI. 2 was the variant that was found 
to have the smallest number of target genes 
(Supplementary Table S3). Overlap analysis disclosed 
the number of genes containing peaks of a single variant 
or several variants (Supplementary Figure S24), and 
expression analysis revealed that genes with only HI. 2 
peaks were less expressed than target genes containing 
peaks of any other HI variant (Figure 7A), in agreement 
with data above showing lower expression of genes con- 
taining elevated levels of HI .2 at distal promoter or coding 
regions. In those genes, the peak tended to be outside the 
promoter for HI. 2, but at the promoter for the other 
single variant target genes (Supplementary Table S3). On 
the other hand, genes presenting H1.2-depleted regions 
were highly expressed, while genes with depleted regions 
of H 1.0, HI. 4 or H IX were expressed at lower levels than 
the total transcriptome average (Figure 7A). 

We further investigated whether the identified 
Hl-enriched regions fell within genes, proximal regulatory 
regions or distal intergenic regions using CEAS software 
(70). Again, HI. 2 was more differently distributed than the 
other variants analyzed. H1.0-HA, H1X and H1.4-HA 
peaks were overrepresented in promoters, UTRs, exons 
and downstream regulatory regions, and underrepresented 
in distal intergenic regions compared with the complete 
genome, whereas H1.2-enriched regions were overrepre- 
sented in intergenic regions and underrepresented in 
exons and promoters (Figure 7B and Supplementary 
Figure S25). Except for those for HI. 2, HI peaks were as 
abundant in introns as in distal intergenic regions. On the 
other hand, depleted regions were similarly distributed 
across compartments in the different HI variants, except 
H1.2-depleted regions, which were more abundant at pro- 
moters and less so at intergenic regions. 

In summary, our data shows that histone HI is 
not uniformly distributed along the genome and there 
are differences between variants, HI. 2 being the 
one showing the most specific pattern and 



Figure 4. Continued 

(ENCODE average of 9 cell lines), CpG and LADs (data from Tig3 lung fibroblasts). (C) Box plots showing the occupancy of HI variants (input- 
subtracted ChlP-seq signal) within LADs. Significance was tested using the Kolmogorov-Smirnov test taking as a control a random sample of 
windows with equal width to the LADs. Enrichment and depletion is marked with red and blue asterisks, respectively. *P< 0.001. (D) Genome-wide 
correlation scatterplots of HI. 2 and H1X variants versus GC content. X axes: average input-subtracted HI signal (normalized to 1000 bp window). Y 
axes: GC%. R: Pearson's correlation coefficient. 
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Figure 5. HI is depleted from regulatory regions but present at CpG sites in a variant-specific manner. (A) Average, input-subtracted ChlP-seq 
signal of HI variants around gene bodies flanked by TSS and transcription termination site (TTS), grouped according to basal expression (10% of 
total genes in each group). EG1 represents top expressed genes and EG10 genes with the lowest expression. Average for all genes is shown in black. 
Genie regions are represented as a 3-kb-long meta-gene surrounded by 1 kb region upstream TSS and 1 kb downstream TTS. (B) Average, input- 
subtracted ChlP-seq signal of HI variant around the center of genomic CTCF and p300 binding sites (data from T47D cells). (C) Average, input- 
subtracted ChlP-seq signal of HI variant around the center of CpG islands (as defined in UCSC database). 



strongest correlation with low gene expression in breast 
cancer cells. 



DISCUSSION 

Mapping of HI variants by ChIP with variant-specific 
antibodies and protein tagging uncovers differences 
between HI. 2 and the other variants in breast cancer cells 

Herein, we have investigated the distribution of all 
somatic histone HI variants present in breast cancer 
cells, i.e. H1.0, H1X and HI. 2 to HI. 5 by combining 
ChIP with genomic technologies such as tiling promoter 
array hybridization and high-resolution sequencing. After 



testing several HI variant-specific antibodies that we and 
others have produced, only HI. 2 and H1X commercial 
antibodies were found to be useful in the ChlP-qPCR ex- 
periments, and variant specific, as shown by performing 
ChIP experiments in HI. 2 and H1X knockdown (KD) 
cells. Consequently, we generated stable cell lines express- 
ing HA-tagged versions of the HI variants at protein 
levels close to or below endogenous levels, despite 
mRNA levels of exogenous HI forms being higher (data 
not shown). This suggests that HI is tightly posttranscrip- 
tionally regulated to control the overall levels of H 1 and 
the proportion between variants, which vary considerably 
across cell types and cell lines. HA-tagging allowed us to 
perform ChIP of all variants with the same antibody, 
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Figure 6. Human chromosomes show enrichment of different HI variants in relation to average gene expression and gene richness. Heat map and 
dendrogram of the occupancy of HI variants (input-subtracted ChlP-seq signal average of 50 bp genomic windows) at individual chromosomes. Gene 
expression and gene richness coefficient (GRC) of all chromosomes are shown as heat maps. GRC > 2 are shown in the same color. 



ruling out the variability being due to diverse antibody 
specificity or affinity. We found that all the HI variants 
studied are widely distributed along the genome and 
within promoters with few differences between HA- 
tagged H1.0, HI. 3, HI. 4 and HI. 5. In contrast, endogen- 
ous HI. 2 presents striking differences. We rule out the 
possibility that the differential distribution is due to 
antibody usage or protein overexpression, as endogenous 
H1X presented an occurrence similar to HA- tagged 
variants and exogenous H1.2-HA resembled its endogen- 
ous counterpart more closely than the other Hl-HAs. 

On this basis, we report that, in the cell line investigated, 
HI. 2 presents a variant- specific distribution and may have 
differential functions. In fact, we reported elsewhere that 
HI. 2 KD produces unique effects, namely, cell cycle arrest 
at Gl and decreased nucleosome spacing, not seen in other 
HI KDs, and these were observed not only in T47D cells 
but also in MCF7 cells (59). Nonetheless, this feature was 
not general, as it was not seen in other cell types tested, 
including HeLa cells in which HI. 2 is highly abundant, 
indicating that HI variants may have cell type-dependent 
specific effects. Instead, our data cannot rule out that the 
other variants studied may have redundant functions and 
distribution in breast cancer cells. A recent report on the 
genomic distribution of Dam-Hl.l to HI. 5 in lung fibro- 
blasts IMR90 cells found that Hl.l is the only subtype 
showing divergent features (66). Hl.l is not expressed in 
breast cancer cells or in many other cell types. Instead, 
HI. 2 and HI. 4 are the only variants that have been 
found in all cell lines tested to date (29,78). 



Additionally, mRNA levels of these two variants are 
maintained in nondividing cells and along differentiation, 
compared with HI. 3 and HI. 5 levels that are reduced 
(31,79). Although too small a sample, these results 
suggest that different HI subtypes may play different 
roles in different cell types, over the course of development 
and in cancer cells, inviting further investigation of HI 
variants occurrence. 

We have noticed that H1.2-HA was not distributed in 
exactly the same way as endogenous HI. 2 and showed 
intermediate features somewhat similar to the other Hl- 
HAs. We believe that this recombinant protein has the 
HI. 2 structural features that direct it to the natural 
H1.2-occupied sites, but owing to its overexpression it 
may also locate at distinct sites normally occupied by 
other HI variants. We have observed, by ChIP, that on 
knock down of endogenous HI. 2, H1.2-HA occupancy 
increased (data not shown), suggesting a relocation to 
HI. 2 sites. Overall, we believe that caution should be 
taken when interpreting data generated with exogenous 
histone variants fused either to the Dam domain or to 
peptide tags. 

HI depletion from promoters and coding regions is more 
pronounced than H3 depletion and shows differences 
between HI variants 

Our analysis has also shown that all His are removed 
from active promoters, with maximum depletion close to 
TSS but extending several nucleosomes upstream, beyond 
the reported NFR, and within the coding regions. These 
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Figure 7. Genomic annotation of regions found to be enriched or depleted of individual HI variants and expression of target genes. (A) The 
expression profiles of target genes containing enriched or depleted regions for a unique variant are shown as box plots. The profile of genes 
containing both H1.2-HA and H1.2endo (replica 2) enriched or depleted regions are also shown (2HA & 2e_r2). Significance was tested using 
the Kolmogorov-Smirnov test. Enrichment and depletion is marked with red and blue asterisks, respectively. **p< 0.001 and *P< 0.005. (B) 
Genomic annotation of regions enriched or depleted of endogenous HI. 2 or H1X. Pie diagram of distribution of HI variants enriched regions at 
genes, proximal regulatory regions and distal intergenic regions. Promoter and downstream regions are defined as 3000 bp upstream TSS or down- 
stream TTS, respectively. The proportions of the HI. 2 and H1X enriched or depleted regions in several genomic features were significantly different 
from the whole genome proportions of those features (P < 2.2e-16). Significance was tested using in-house R scripts. 



regions containing nucleosomes but not HI may coincide 
with H2A.Z and H3.3-containing nucleosomes, as both 
H2A.Z and H3.3 have been reported to locate at active 
promoters surrounding the NFR, where they positively 
regulate transcription (80-82). Additionally, other 
authors have observed weaker histone HI binding in 
H2A.Z-containing nucleosomes (83) and a negative 
genome-wide correlation between HI and H3.3 (63). 
These observations support the view that HI removal is 
part of the chromatin remodeling events that occur on 
promoter activation to facilitate binding of transcription 
factors and the RNA polymerase machinery (49,84-86). 
Furthermore, the shape of the HI. 2 (and H1.2-HA) valley 
at the TSS in ChlP-chip and ChlP-seq data (Figures 2 and 
5) was slightly different from that of other HI variants. 
Unlike the signals for other variants, the HI. 2 signal did 
not show local enrichment immediately after the TSS. This 
local enrichment may coincide with a well-positioned 



nucleosome (+1), flanked by phased nucleosomes. This 
indicates that such a nucleosome may contain any HI 
variant except HI. 2. Additionally, HI. 2 was not 
abundant around the TSS of repressed genes, suggesting 
that TSS of genes are epigenetically marked, including the 
absence of HI. 2. Overall, we have shown a strong rejec- 
tion of HI. 2 from the TSS of most genes. 

Interestingly, we have found that immediate-early re- 
sponsive promoters, under nonstimulating conditions, 
are prepared to respond to stimuli by keeping the TSS 
free of HI, indicating that mechanisms other than tran- 
scription initiation might dictate HI clearance. In this 
case, there is also histone H3 depletion at the TSS 
compared with at the distal promoter in the absence of 
stimuli, indicating that the NFR might be maintained to 
allow rapid response after stimulation. Supporting our 
hypothesis, it has been recently proposed that transcrip- 
tion factors interact with DNA in a dynamic way, and 
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some transcription factor-DNA interactions are estab- 
lished before the stimuli, especially at immediate-early 
genes (87). 

Comparison of HI occupancy with H3 has shown that 
all His except HI. 2 follow the distribution of the core 
histone, whether this represents nucleosome enrichment, 
stability or defined positioning through the cell popula- 
tion. Nonetheless, HI depletion at promoters and regula- 
tory sites (CTCF or p300 binding sites) is more extensive 
than H3, denoting that nucleosomes might be ejected from 
delimited sites such as the NFR at the TSS, but HI might 
be depleted from larger regions encompassing several nu- 
cleosomes. This is in agreement with previous reports 
showing that dips of low HI occupancy at TSS and regu- 
latory sites are not due to a lack of nucleosomes as they 
show enrichment of the core histone variant H3.3 (63). 
Moreover, at coding regions, the differential content of 
HI in active versus repressed genes is more pronounced 
than those of H3, especially toward the 5' of genes. 
Consequently, gene-rich domains might adopt an overall 
decondensed chromatin structure. Nonetheless, at active 
genes HI is less abundant in promoters than coding 
regions, indicating that HI presence might be more re- 
strictive for transcription initiation than for elongation. 

Initial ChlP-qPCR experiments indicated that all HI 
variants were present at all tested promoters. 
Nonetheless, hybridization of ChIP material with a 
promoter array revealed that promoters might present dif- 
ferential HI variant abundance (Figure 3). The most 
striking difference is between HI. 2 and the other His, 
including H1X. Subsets of genes with the highest abun- 
dance of one variant and the lowest of another have been 
identified, i.e. those with a high or low H1.2/H1X ratio. 
Overall, expression of genes presenting these features is 
different, relating HI variant content with gene expres- 
sion. Notably, the relative abundance of HI. 2 and H1X 
in the selected promoters was conserved in the distant 
HeLa cell line, but not in MCF7 cells. Thus, we propose 
that the relative promoter abundance of HI variants may 
be related to, among other factors, their relative HI 
variant content in a given cell type. 

Two types of Hl-containing chromatin are present in 
breast cancer cells with different association with gene 
density and expression 

The negative correlation observed between gene activity 
and HI. 2 content found at promoters extended upstream 
toward the whole genomic region. Patches of HI. 2 enrich- 
ment seem to be associated with gene repression, gene- 
poor regions (including entire chromosomes, such as 
chromosome 13), low GC content or LADs, features 
related to chromatin compaction (Figure 4). Moreover, 
H1.2-enriched regions were frequently found at intergenic 
regions. Similar results were found in previous studies, 
linking histone HI to repressive and compacted regions 
of the genome and suggesting a role for HI in 3D organ- 
ization of the genome. Some of these features were 
described by Cao et al. for mouse Hlc Myc and Hld FLAG 
in ESCs, the closest orthologs of human HI. 2 and HI. 3, 
by Li et al. for human HI. 5 in differentiated IMR90 



fibroblasts, and by Izzo et al. for human Dam-H1.2 to 
HI. 5 in IMR90 cells also (64-66). However, in the last 
of these, Hl.l presented a DamID binding profile 
distinct from the other subtypes that, in some extent, re- 
sembles the distribution of HI other than HI. 2 in our 
analysis in breast cancer cells, that is, they were more 
closely associated with higher GC content, genes, its pro- 
moters and CpG islands, and were not enriched in LADs. 
Interestingly, in the study of Cao et al. when single peaks 
for Hlc and Hid in mouse ESCs were compared, Hid 
(HI. 3) was more closely related to GC-rich sequences 
and LINES, and Hlc (HI. 2) to AT-rich sequences, 
Giemsa-positive regions and satellite DNA. It is conceiv- 
able that there are at least two groups of HI variants with 
different distributions in each cell type, such that taken 
together histone HI variants cover the whole genome, 
being present in most of the nucleosomes. 

Whether a single variant may present distinct features in 
different cell types rather than having intrinsic properties 
is an intriguing question. Factors involved may be the 
relative and absolute abundance of each variant and 
whether a genome needs more plasticity or is progressively 
silenced, i.e. pluripotency versus terminal differentiation. 
In this sense, Li et al. described the existence of zones of 
HI. 5 enrichment in differentiated fibroblasts but not in 
ESCs (64), and it has been reported that architectural 
proteins, such as HP1 and HI, are hyperdynamic and 
bind loosely to chromatin in ESCs (88,89). Additionally, 
we have previously reported progressive changes in the 
expression and abundance of HI variants over the 
course of differentiation of human embryonic stem cells 
and of reprogramming of differentiated cells to Induced 
pluripotent stem cells (iPS), i.e. the opposite direction (31). 
Thus, considering the importance of HI in chromatin 
structure and compaction, differential expression and/or 
distribution of HI variants could mediate the transition 
between different chromatin states, and explain the more 
'open' chromatin state of undifferentiated cells, which 
contributes to the maintenance of pluripotency by 
creating a poised chromatin state that leads to rapid acti- 
vation of lineage-specific genes when differentiation is 
induced. In fact, it has been proposed that different 
'anti-silencing' mechanisms, including incorporation of 
specific histone variants such as H3.3, are involved in 
the maintenance of open chromatin in ES cells (90). 

Cancer is another cellular state in which global chroma- 
tin rearrangement is observed. In fact, alterations in 
nuclear morphology are one of the characteristics of 
cancer cells. Tumor-originated cells accumulate genetic 
and/or epigenetic differences compared with nontumor 
cells, and chromatin is reorganized leading to altered 
gene expression programs and higher plasticity. The 
hallmark of cancer is dedifferentiation and gene 
dysregulation. DNA methylation and histone modifica- 
tions are two epigenetic mechanisms that are altered in 
cancer cells. Moreover, large organized chromatin K 
(lysine) modifications are reduced in cancer (91), and 
genes encoding proteins of the nuclear membrane 
present altered expression in many cancer types (92), 
indicating that LADs might be partially disorganized in 
cancer in accordance with the large-scale chromatin 
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decondensation. Thus, it is conceivable that the distribu- 
tion of histone HI variants could be different in such 
reorganized nuclei, to that observed in nonmalignant 
cells. In turn, this could be the reason why in our study 
most of the HI variants in genome regions were found to 
be associated with more active and open chromatin. 
Moreover, given the association of HI with LADs 
reported here and by Izzo et al., we hypothesize that HI 
could be a key player in establishing LADs in normal cells, 
and could also participate in the rearrangement of such 
domains in cancer cells due to a different prevalence of HI 
variants within these domains. Alternatively, LAD re- 
organization in cancer cells could cause HI variant 
redistribution in these genomic domains. 

Tumor cells are characterized by a different methylome 
from that of normal cells [reviewed in (93)]. There is both 
global CpG hypomethylation, causing genomic instability, 
and hypermethylation of particular promoters including 
tumor-suppressor genes. In our analysis, we found that 
CpG islands contain H1.0, H1X and to a lesser extent 
HI. 4, but not HI. 2. This might reflect the relative 
abundance of these variants at promoters and suggests 
that promoter occupancy by HI variants other than 
HI. 2 is more permissive for transcription regulation in 
breast cancer cells. Alternatively, as HI. 2 prevalence in 
intergenic CpG islands is also lower than that of other 
variants, we cannot rule out a direct role of the different 
HI variants in CpG island regulation in breast cancer 
cells. 

Similarly, within a long region of genomic sequence, 
genes are often characterized by having a higher GC 
content than the background GC content of the entire 
genome. We found that HI variants except HI. 2 are 
associated with higher GC content regions, consistent 
with the preferential location of HI -enriched regions 
within genes. HI. 2 presents an inverse correlation with 
GC content at a genome-wide level and H1.2-enriched 
regions associate with lower GC content than other 
variants. In our analysis, H3 also associates preferentially 
with higher GC-content regions, in agreement with reports 
describing greater nucleosome-space occupancy 
coinciding with active transcription and higher GC 
contents (94). 

Altogether, it seems that HI variants are differentially 
associated with CpG islands and GC content in breast 
cancer cells. Our data are not completely consistent with 
previous reports showing low amounts of HI in CpG 
islands (65,95). However, mouse Hid was more closely 
associated with GC-rich regions than Hlc in the study 
of Cao et al. (65). Additionally, another study showed 
HI variant-dependent interaction with DNMTs (96). In 
that study, it was found that, unlike other HI variants, 
Hlc (HI. 2) does not interact with DNMT1 and 
DNMT3B. Based on the differential association of HI 
variants with CpG islands and GC-rich regions in T47D 
breast cancer cells, we hypothesize that a redistribution of 
most of histone HI variants in cancer may help to estab- 
lish a differential chromatin state, but also an altered 
methylation pattern. In fact, HI variants are differentially 
related to several types of cancer (33,97). Additionally, 
comparison of human mammary epithelial cells with 



breast cancer cell lines including T47D (98) showed 
global massive hypomethylation at CpG-poor regions, 
and hypermethylation at CpG-rich gene-related regions, 
proximal to the TSS, where local enrichment of all HI 
variants except HI. 2 is observed in our data. Moreover, 
hypomethylated regions in breast cancer cells coincide 
with repressive chromatin, gene silencing, repressive 
histone posttranslational modifications (PTMs), intergenic 
regions and LADs (99), which in turn coincides with an 
enrichment of HI. 2 found in our analysis. Further inves- 
tigation of the DNA methylation profile of T47D breast 
cancer cells could confirm a differential role of HI variants 
in establishing or maintaining DNA methylation in breast 
cancer. 

Chromatin containing HI variants other than HI. 2 
might support a level of compaction that facilitates a 
rapid conversion into either an active or a repressed 
state and, consequently, these variants are allowed at 
TSS of genes before activation. In fact, a particular 
posttranslational modification in HI. 4 (K34Ac) has been 
found to locate around the TSS of active genes (49). 
Instead, we have described that HI. 2 occupancy at distal 
promoters is the best predictor of gene repression. 
Moreover, genes presenting H1.2-enriched regions are 
clearly less strongly expressed than average. This study 
points toward the inclusion of HI. 2 as a repression 
mark and to it being associated with closed chromatin. 
In this regard, HI. 2 has been found to be included in a 
p53-containing repressive complex in HeLa cells (50), and 
murine HI. 2 has been found to be developmentally 
upregulated in the retina, promoting facultative hetero- 
chromatin formation in mature rod photoreceptors (100). 

Several studies have compared the chromatin binding 
affinity and residence time on chromatin of the different 
HI subtypes in different organisms or cell lines, as well as 
its nuclear localization, obtaining diverse, if not contro- 
versial, results on the functional heterogeneity of HI 
variants. In general, HI. 2 is among the variants presenting 
intermediate or low affinity for chromatin and, conse- 
quently, elevated mobility. Instead, HI. 4 has been 
mostly associated with high affinity, low mobility and 
colocalization with heterochromatin (40,101-103). We 
do not fully understand how these properties may relate 
or contradict our observation of HI. 2 being enriched in 
repressed and gene-poor chromatin in breast cancer cells. 
Certainly, different experimental approaches performed in 
the same cell model would facilitate to reconcile the dif- 
ferent observations. 

There is nowadays increasing evidence of a 3D organ- 
ization of the genome within the cell nucleus. Interphase 
chromatin is organized in large chromosome territories 
defined as 'topological domains', which can interact 
despite being several megabases apart (104,105). These 
domains are stable across different cell types and highly 
conserved across species. It has already been reported that 
embedded genes in these domains are in a transcription- 
ally similar state and associated with transcriptionally 
related histone marks and chromatin features. Hence, it 
is not unreasonable to speculate that HI could be involved 
in the formation or maintenance of such domains due to 
its role in chromatin structure. High- throughput profiling 
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of chromatin marks and components has recently made it 
possible to define chromatin states (106,107). In 
Drosophila cells, five principal chromatin types have 
been described, HI being present in all of them in different 
proportions (107). Although this may reflect the general 
features of HI occurrence, in cells presenting several HI 
subtypes a differential distribution of subtypes between 
different chromatin types may occur, as is suggested in 
our study. We have found that HI. 2 is the variant most 
closely associated with LADs, low GC content and gene- 
poor regions and chromosomes that are normally located 
at the periphery of the nucleus, features related to chro- 
matin compaction, while chromatin associated with the 
other variants presents features of a more plastic chroma- 
tin. Interestingly, gene-rich chromosomes, presumably 
with a more dynamic chromatin and histone HI 
exchange, and located toward the center of the nucleus, 
are enriched in HI variants synthesized all through the cell 
cycle, namely H1.0 and H1X. It would be interesting to 
further analyze the colocalization of the different human 
HI variants with chromatin marks and components that 
better define the diverse chromatin states, although these 
types of comparisons are limited by the availability of 
high- throughput data on the same or related cell types. 
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